Developing the employment heatmap visualization

Current Canadian sentiment is at a low, with high cost-of-living, global political instability, and sweeping layoffs across multiple sectors. For the 2025 plotnine contest, I wanted to explore current official Canadian labour statistics using plotnine, a data visualization library in python.

Introduction

I am so happy that plotnine exists, which is a relatively new python data visualization package. plotnine is based on ggplot2, an R package that I have been using for almost a decade.

In this tutorial, I’ll walk through the process of creating my plotnine 2025 contest submission. The plot shows employment across Canadian industries, ranked by their percent change in monthly employment. To help visualize data across different industries, industry-specific plots are laid out in a “pseudo” interactive manner.

Setup

Data

The data can be downloaded using this bash script, or directly from StatCan’s website.

Parameters

In this initial code chunk we initialize some paramters that, later if needed, we can rerun this entire notebook with different paramters (e.g. different years).

pyprojroot is similar to R’s package here, which lets us construct filepaths relative to the project root. This is very convenient especially for quarto projects with complex file organization.

from pyprojroot import here
LABOUR_DATA_FILE = here() / "data" / "14100355.csv"
FIGURE_THEME_SIZE = (8, 6)
FILTER_YEAR = (2018, 2025)

Libraries

# Data manipulation
import polars as pl
import polars.selectors as cs

# Visualization
from plotnine import *

# Mizani helps customize the text and breaks on axes
from mizani.bounds import squish
import mizani.labels as ml
import mizani.breaks as mb
import textwrap  # for wrapping long lines of text

# Custom extract and transform functions for plot data
from labourcan.data_processing import read_labourcan, calculate_centered_rank

Read and process data for graphing

The visualization required a fair amount of data processing which is detailed in this page. The steps are summarized here:

read_labourcan returns a polars.Data.Frame with:

  • Unused columns removed
  • Filtered to seasonally adjusted estimates only
  • Filtered to Canada level estimates
  • Additional YEAR, MONTH, and DATE_YMD columns extracted from REF_DATE
  • Sorted chronologically by year and month

See labour.qmd for details on data processing.

labour = read_labourcan(LABOUR_DATA_FILE)
labour_processed = calculate_centered_rank(labour)

A first attempt

The type of visual that’s being developed here is something like a heatmap of employment numbers.

We want a clean separation of industries that are growing or shrinking. For that we are using a rank ordering by % monthly changed. But not just any ordinary rank, we center it around 0 such that sectors that are growing (% change > 0) have a positive rank and those that are shrinking are negative.

scale_color_gradient2 is a great option because it allows specification of our midpoint=0

(
    ggplot(
        (
            labour_processed.filter(
                pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
            )
        ),
        aes(x="DATE_YMD", y="centered_rank_across_industry", color="PDIFF"),
    )
    + geom_point(shape="s")
    + theme_tufte()
    + theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
    + scale_color_gradient2(
        limits=(-0.01, 0.01), low="#ff0000ff", high="#0000dbff", midpoint=0, oob=squish
    )
)

geom_point or geom_tile

The whitespace between each point is distracting. I could make the point size larger, but the ratio of point size to range of the x and y axis, as well as the figure size all will ultimately determine how much whitespace remains between each point.

If we use geom_tile instead, which will plot rectangles specified by a center point, we can explicitly control the whitespace between tiles.

(
    ggplot(
        (
            labour_processed.filter(
                pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
            )
        ),
        aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF"),
    )
1    + geom_tile(height=0.95, width=30 * 0.95)
    + theme_tufte()
    + theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
    + scale_fill_gradient2(
        limits=(-0.01, 0.01), low="#ff0000ff", high="#0000dbff", midpoint=0, oob=squish
    )
)
1
I added height = 0.95 to add some whitespace between tiles vertically. To remove horizontal whitespace, we need to specify a width. Because we are using a datetime axis, we need to specify it in unit of days. But each tile here is a month, so we need to express in units of 30 hence: width = 30*0.95.

Explicit color mapping with scale_color_manual

I am fairly happy with the scale_fill_gradient2 used with squish. We get a really nice palette that’s centered around 0. However scale_fill_gradient2 is limited to 3 colors (high, midpoint, low), which is not quite enable the more dynamic color palette that I’m seeking.

To be more explicit with the colors, I will bin the % change variable and then map each bin to a color manually using scale_fill_manual.

Bin with polars.Series.cut

labour_processed_cutted = (
    labour_processed.with_columns(
        pl.col("PDIFF")
        .cut(
            [
                -0.05,
                -0.025,
                -0.012,
                -0.0080,
                -0.0040,
                0,
                0.0040,
                0.0080,
                0.012,
                0.025,
                0.05,
            ]
        )
        .alias("PDIFF_BINNED")
    )
    .with_columns(
        pl.when(pl.col("PDIFF") == 0)
        .then(pl.lit("0"))
        .otherwise(pl.col("PDIFF_BINNED"))
        .alias("PDIFF_BINNED")
    )
    .sort("PDIFF")
    .with_columns(pl.col("PDIFF_BINNED"))
)
labour_processed_cutted.group_by("PDIFF_BINNED").len()
shape: (14, 2)
PDIFF_BINNED len
cat u32
null 21
"(0.004, 0.008]" 1736
"(0.012, 0.025]" 1292
"(-0.012, -0.008]" 717
"(-0.025, -0.012]" 892
"(0.025, 0.05]" 315
"(-inf, -0.05]" 47
"(-0.004, 0]" 1999
"(0.05, inf]" 58
"(-0.008, -0.004]" 1201
(
    ggplot(
        (
            labour_processed_cutted.filter(
                pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
            )
        ),
        aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
    )
    + geom_tile(height=0.95)  # whitespace between tiles, vertically
    + theme_tufte()
    + theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
)

scale_fill_manual for explicit color mapping

Now we need to order the levels, and map to a specific color palette.

We will make PDIFF=0% (no change) to be gray, positive values to have green and blue colors (growth = good), and negative values to be red and orange (contraction = bad) colors.

order = (
    labour_processed_cutted.drop_nulls()
    .sort("PDIFF")
    .select(pl.col("PDIFF_BINNED"))
    .unique(maintain_order=True)
    .to_series()
    .to_list()
)

labour_processed_cutted_ordered = labour_processed_cutted.with_columns(
    pl.col("PDIFF_BINNED").cast(pl.Enum(order))
)

color_mapping = {
    "(-inf, -0.05]": "#d82828ff",
    "(-0.05, -0.025]": "#fa6f1fff",
    "(-0.025, -0.012]": "#f1874aff",
    "(-0.012, -0.008]": "#f1b274ff",
    "(-0.008, -0.004]": "#FEE08B",
    "(-0.004, 0]": "#FFFFBF",
    "0": "#a8a8a8ff",
    "(0, 0.004]": "#E6F5D0",
    "(0.004, 0.008]": "#bce091ff",
    "(0.008, 0.012]": "#9ad65fff",
    "(0.012, 0.025]": "#78b552ff",
    "(0.025, 0.05]": "#5cb027ff",
    "(0.05, inf]": "#1f6fc6ff",
}

(
    ggplot(
        (
            labour_processed_cutted.filter(
                pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
            )
        ),
1        aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
    )
    + geom_tile(color="white")
    + theme_tufte()
    + theme(figure_size=FIGURE_THEME_SIZE, axis_text_x=element_text(angle=90))
2    + scale_fill_manual(values=color_mapping, breaks=order)
)
1
map fill to PDIFF_BINNED
2
provide explicit color mapping to scale_fill_manual

The power of scale_fill_manual is that it enables much more explicit control over how color is mapped to data. However, the cost was that it takes a lot more effort and lines of code, compared to scale_fill_gradient2, which works well “out-of-box”.

The legend

…is mathematically accurate, however we are going to make it nicer to look at.

First let’s make the text more concise: we don’t need every bin to be labelled, and instead of listing the range, we can just describe the midpoint.

legend_labels = [
    "-5%",  # the ends can be labelled with the boundary e.g. implies <-5%
    "",
    "",
    "-1%",
    "",
    "",
    "No change",
    "",
    "",
    "",
    "1%",
    "",
    "5%",
]

(
    ggplot(
        labour_processed_cutted.filter(
            pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
        ),
        aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
    )
    + geom_tile(color="white")
    + theme_tufte()
    + theme(
        figure_size=FIGURE_THEME_SIZE,
        axis_text_x=element_text(angle=90),
        legend_justification_right=1,
        legend_position="right",
        legend_text_position="right",
        legend_title=element_blank(),
        legend_key_spacing=0,
        legend_key_width=10,
        legend_key_height=10,
        legend_text=element_text(size=8),
    )
1    + scale_fill_manual(values=color_mapping, breaks=order, labels=legend_labels)
)
1
provide the list legend_labels to scale_fill_manual

I originally wanted to make a horizontal legend, but this works much better.

Text and fonts

Next up is the text and fonts. I played with a few fonts on google fonts before settling on two. Note that this website uses these fonts with the help of brand.yml

Install the fonts:

FONT_PRIMARY = "Playfair Display"
FONT_SECONDARY = "Lato"
import mpl_fontkit as fk
fk.install(FONT_PRIMARY)
fk.install(FONT_SECONDARY)
Font name: `Playfair Display`
Font name: `Lato`

mizani for axis breaks and labels

plotnine breaks and labels for the scales can be easily adjusted using mizani, which is like the scales equivalent to ggplot2

We’re going to use mizani.breaks.breaks_date_width to put breaks for each year, and mizani.labels.label_date to drop the “month” part of the date.

import mizani.labels as ml
import mizani.breaks as mb

plot = (
    ggplot(
        labour_processed_cutted.filter(
            pl.col("YEAR") >= FILTER_YEAR[0], pl.col("YEAR") <= FILTER_YEAR[1]
        ),
        aes(x="DATE_YMD", y="centered_rank_across_industry", fill="PDIFF_BINNED"),
    )
    + geom_tile(color="white", height=0.95)
    + theme_tufte()
    + theme(
1        text=element_text(family=FONT_PRIMARY),
        figure_size=FIGURE_THEME_SIZE,
        axis_text_y=element_text(family=FONT_SECONDARY),
        axis_text_x=element_text(family=FONT_SECONDARY),
        axis_title_y=element_text(weight=300),
        legend_justification_right=1,
        legend_position="right",
        legend_text_position="right",
        legend_title_position="top",
        legend_key_spacing=0,
        legend_key_width=15,
        legend_key_height=15,
        legend_text=element_text(size=8, family=FONT_SECONDARY),
        legend_title=element_blank(),
        plot_title=element_text(ha="left"),
        plot_subtitle=element_text(
            ha="left", margin={"b": 1, "units": "lines"}),
    )
    + scale_fill_manual(values=color_mapping,
                        breaks=order, labels=legend_labels)
    + guides(fill=guide_legend(ncol=1, reverse=True))
    + scale_x_datetime(
2        labels=ml.label_date("%Y"),
        expand=(0, 0),
        breaks=mb.breaks_date_width("1 years"),
    )
3    + labs(
        title="Sector Shifts: Where Canada's Jobs Are Moving",
        subtitle=textwrap.fill(
            "Track the number of industries gaining or losing jobs each month. Boxes are shaded based on percentage change from previous month in each industry's employment levels.",
            width=75,
        ),
        x="",
        y="< SECTORS FALLING            SECTORS RISING >",
    )
)
plot
1
Apply font family changes to the primary font in theme(...)
2
Use mizani to format labels to show only the year in scale_x_datetime
3
Add title, subtitle and wrap long lines with the help of textwrap

Highlighting an Industry

For more industry-specific insights, I would like to see where each individual ranks in the graphic.

1INDUSTRY = 'Wholesale and retail trade [41, 44-45]'

2plot_data_subsetted = labour_processed_cutted.filter(
    pl.col("YEAR") >= FILTER_YEAR[0],                                     
    pl.col("YEAR") <= FILTER_YEAR[1],                                     
    pl.col('Industry') == INDUSTRY                                       
)

(
    plot
3    + geom_point(data=plot_data_subsetted, color='black', fill='black')
)
1
Specify indsutry
2
Subset data
3
Add the subsetted data to another geom_point layer

Line plot of unemployment

Appendix

Things that didn’t work

This section is a non-exhaustive list of design elements I wasn’t able to solve with plotnine

https://ggplot2.tidyverse.org/reference/geom_tile.html#aesthetics

Horizontal legend with horizontal legend text

Initially I wanted a horizontal legend for the colors. But in order to remove the whitespace between keys, I discovered that the text needs to be smaller than the legend keys, otherwise they “push” the legend keys apart in uneven manner. I attempted to (unsuccesfully) address this by making the legend text small, eliminating as much text as possible (e.g. removing the “%” characters for -0.50 and 0.50), and lastly increasing the legend key size.

But it still didn’t really work out the way I hoped, so I stuck with a vertical legend instead.